Skip to content

Conversation

@ericmjl
Copy link
Owner

@ericmjl ericmjl commented Mar 25, 2025

  • Implemented the 'generate_synthetic' command to create synthetic datasets for various tasks.
  • Added the 'list_synthetic_tasks' command to display available synthetic data tasks and their descriptions.
  • Enhanced the CLI with options for task-specific parameters and data splitting.

ericmjl added 2 commits March 24, 2025 21:53
- Implemented the 'generate_synthetic' command to create synthetic datasets for various tasks.
- Added the 'list_synthetic_tasks' command to display available synthetic data tasks and their descriptions.
- Enhanced the CLI with options for task-specific parameters and data splitting.
- Introduced a monkey patch for the `generate_random_sequences` function to ensure the custom alphabet is used.
- Removed the alphabet parameter from task-specific parameters to prevent conflicts.
- Ensured the original function is restored after dataset generation or upon encountering errors.
@ericmjl ericmjl merged commit c1ffbce into main Mar 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants